Wrapping Data into XML

نویسندگان

  • Wei Han
  • David Buttler
  • Calton Pu
چکیده

The vast majority of information that is available online, and coming online in this near future is only available in HTML. In order to use this information for more than human browsing, it must be converted into a machine-readable format. Wrappers have been the key tool to make the conversion from HTML into semantically meaningful and well-structured XML data. However, developing wrappers is slow and tedious work with typically brittle results. This paper describes XWRAP Elite, a tool to automatically generate robust wrappers, which breaks down the conversion process into three procedures: discovering where the data is located in an HTML page and separating the data into individual objects; decomposing objects into data elements; marking objects and elements in an output format. XWRAP Elite automates the rst two procedures and requires minimal human involvement in marking output data. In addition, there is a code generation component to package all of the pieces into a stand-alone wrapper.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Wrapping Web Pages into XML Documents

The notion of wrapping a web server into XML documents is driven from the need for structured data that can be used by a variety of applications. The web contains vast amounts of information that is useless to most applications since it is mainly targeting a human audience. A solution to this would be to automate the browsing process and then convert the extracted information into a more suitab...

متن کامل

Countering Wrapping Attack on XML Signature in SOAP Message for Cloud Computing

It is known that the exchange of information between web applications is done by means of the SOAP protocol. Securing this protocol is obviously a vital issue for any computer network. However, when it comes to cloud computing systems, the sensitivity of this issue rises, as the clients of system, release their data to the cloud. XML signature is employed to secure SOAP messages. However, there...

متن کامل

Accurately and Reliably Extracting Data from the Web: A Machine Learning Approach

A critical problem in developing information agents for the Web is accessing data that is formatted for human use. We have developed a set of tools for extracting data from web sites and transforming it into a structured data format, such as XML. The resulting data can then be used to build new applications without having to deal with unstructured data. The advantages of our wrapping technology...

متن کامل

A Machine Learning Approach to Accurately and Reliably Extracting Data from the Web

A critical problem in developing information agents for the Web is accessing data that is formatted for human use. We have developed a set of tools for extracting data from web sites and transforming it into a structured data format, such as XML. The resulting data can then be used to build new applications without having to deal with unstructured data. The advantages of our wrapping technology...

متن کامل

XSpRES - Robust and Effective XML Signatures for Web Services

XML Encryption and XML Signature are fundamental security standards forming the core for many applications which require to process XML-based data. Due to the increased usage of XML in distributed systems and platforms such as in SOA and Cloud settings, the demand for robust and effective security mechanisms increased as well. Recent research work discovered, however, substantial vulnerabilitie...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001